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[57] ABSTRACT 

An alignment network between N parallel data input 
ports and N parallel data outputs includes a first and a 
second barrel switch. The first barrel switch fed by the 
N parallel input ports shifts the N outputs thereof and in 
turn feeds the N-l input data paths of the second barrel 
switch according to the relationship X = kr modulo N 
wherein x represents the output data path ordering of 
the first barrel switch, y represents the input data path 
ordering of the second barrel switch, and k equals a 
primitive root of the number N. The zero (0) ordered 
output data path of the first barrel switch is fed directly 
to the zero ordered output port. The N-l output data 
paths of the second barrel switch are connected to the 
N output ports in the reverse ordering of the connec- 
tions between the output data paths of the first barrel 
switch and the input data paths of the second barrel 
switch. The second switch is controlled by a value m, 
which in the preferred embodiment is produced at the 
output of a ROM addressed by the value d wherein d 
represents the incremental spacing or distance between 
data elements to be accessed from the N input ports, and 
m is generated therefrom according to the relationship 
d=k m modulo N, 

3 Claims, 11 Drawing Figures 
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PARALLEL ACCESS ALIGNMENT NETWORK 

WITH BARREL SWITCH IMPLEMENTATION 
FOR D-ORDERED VECTOR ELEMENTS 

The invention described herein was made in the per- 
formance of work under NASA Contract Number 
NAS 2-9456 and is subject to the provisions of Section 
305 of the National Aeronautics and Space Act of 1958 
(72 Stat. 435, 42 U.S.C. 2457). 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

This is a continuation-in-part of application Ser. No. 
820,234 filed July 29, 1977, now U.S. Pat. No. 4,162,534. 

In copending application, Ser. No. 682,526, now U.S. 
Pat. No. 4,051,551, for a “Multidimensional Parallel 
Access Computer Memory System”, filed in the name 
of D. H. Lawrie et al, and assigned to the assignee of 
present invention, there is described and claimed a par- 
allel data processing system for storing and fetching 
d-ordered vectors. Although not limited thereto, the 
present alignment network invention may be used with 
or in such a system. 

BACKGROUND OF THE INVENTION 

The present invention relates to an alignment net- 
work for use in a parallel data processing environment. 
More particularly, the present invention finds applica- 
tion in unscrambing a d-ordered vector having its ele- 
ments stored a distance d apart from each other in the 
parallel memory modules of a parallel data processor. 

In the prior art, as disclosed in U.S. patent applica- 
tion, Ser. No. 682,526, now U.S. Pat. No. 4,051,551, 
filed May 3, 1976, in the names of D. H. Lawrie and C. 
R. Vora and assigned to the assignee of the present 
invention, there is described a cross-bar network for 
transferring and aligning data between a set of parallel 
memory modules and a set of parallel processors. The 
cross-bar network so disclosed is relatively easy to pro- 
gram or control; however, it is also relatively costly in 
components requiring N 2 elementary elements to trans- 
mit data through wherein N is the number of parallel 
memory modules storing data to be aligned. 

Other prior art networks require fewer components 
but present difficult control problems. Typical of this 
type of alignment network is the Benes network requir- 
ing only 2N log 2 N elements, see Benes, V. E., “Optimal 
Rearrangeable Multi-stage Connecting Networks, Part 
2,” Bell System Technical Journal Vol. 43, 1964, p. 
1641. 

Still other prior art alignment networks require fewer 
omponents than the cross-bar network and are not too 
difficult to control or program, but these require multi- 
ple data flow transitions cycling through a single align- 
ment layer thereby increasing the time required for data 
to pass through the network; see Roger C. Swanson, 
“Interconnections for Parallel Memories to Unscramble 
p-ordered Vectors”, IEEE trans. Computers, Novem- 
ber 1974. Swanson’s “p-ordered vectors” corresponds 
to the “d-ordered vector” terminology used herein. 

Therefore, it is an object of the present invention to 
provide an alignment network for d-ordered vectors 
requiring fewer components than a cross-bar network 
and yet being easy to control. 

It is yet another object of the invention to provide 
alignment for d-ordered vectors while requiring only a 
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single pass through any of the elements used for align- 
ment. 

SUMMARY OF THE INVENTION 

5 The above and other objects of the invention are 
realized through an alignment network for use with a 
parallel data system having N parallel data input ports 
and N parallel data output ports, the alignment network 
therebetween having a first and a second barrel switch. 
10 The first barrel switch fed by the N parallel input ports 
shifts the N outputs thereof and in turn feeds the N — 1 
input data paths of the second barrel switch according 
to the relationship x = k>’ modulo N wherein x represents 
the output data path ordering of the first barrel switch, 
15 y represents the input ordering of the second barrel 
switch and k equals a primitive root of the number N. 
The zero (0) ordered output data path of the first barrel 
switch is fed directly to the zero (0) ordered output 
port. The output data paths of the second barrel switch 
20 are connected to the N output ports in reverse ordering 
to the connections between the output data paths of the 
first barrel switch and input data paths of the second 
barrel switch. The second switch is controlled by a 
value m, which in the preferred embodiment is pro- 
25 duced at the output of a ROM addressed by the value d 
wherein d represents the incremental spacing or dis- 
tance between data elements to be accessed from the N 
input ports, and m is generated therefrom according to 
the relationship d— k m modulo N. 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

The above objects and advantages and features of the 
present invention will become more readily apparent 
from a review of the following specification in relation 
35 with the drawings wherein: 

FIG. 1 is a block diagram illustrating a typical operat- 
ing environment of the present alignment network in- 
vention; 

FIG. 2 is a block diagram of an arrangement of the 
40 alignment network of the present invention suitable for 
use in the environment of FIG. 1; 

FIG. 3 is a logic diagram of a two-input selection gate 
used in the alignment network of FIG. 2; 

FIG. 4 is a return flow alignment network to comple- 
45 ment the alignment network of FIG. 2; 

FIG. 5 is an illustration of a read-only memory 
(ROM) programmed to provide a control word for the 
alignment networks of FIG. 2 and FIG. 4; 

FIG. 6 comprising FIGS. 6A, 6B and 6C, is a presen- 
50 tation in tabular format of the generation of control 
words for an alignment network operating with 521 
parallel memory modules; 

FIG. 7 is a diagram of the present alignment inven- 
tion implemented by a pair of barrel switches; and 
55 FIG. 8 is an illustration of a read-only memory 
(ROM) programmed to provide the control input for 
the second barrel switch of FIG. 7. 

DETAILED DESCRIPTION OF THE 
m PREFERRED EMBODIMENT 

With reference to FIG. 1, the alignment network of 
the present invention interfaces between a plurality of 
memory modules M0-M6 and a plurality of processing 
elements P0-P6. Data stored in the memory modules 
65 M0-M6 may be accessed in parallel through Memory 
Ports MP0-MP6, aligned in the Alignment Network 11 
as directed by control word m, and fed through Pro- 
cessing Ports PP0-PP6 for parallel processing by the 
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Processing Elements P0-P6. Although seven Memory 
Modules M0-M6 and seven Processing Elements P0-P6 
are shown in FIG. 1, in alternate embodiments, other 
system arrangements having differing numbers of Mem- 
ory Modules and Processing Elements may be used, see 5 
U.S. patent application Ser. No. 682,526 filed May 3, 
1976, now U.S. Pat, No. 4,051,551 issued Sept. 27, 1977 
by D. H. Lawrie et al for a “Multidimensional Parallel 
Access Computer Memory System*’, assigned to the 
assignee of the present invention. 1° 

For purposes of illustration, a 5 X 5 two-dimensional 
matrix comprising Data Elements an through ass is 
shown loaded into Memory Modules MQ-M6. To pro- 
cess in parallel Data Elements an, ai2, an, au, and an 
the Alignment Network need merely establish a direct 15 
data flow path between Memory Ports MPO, MP1, 
MP2, MP3 and MP4 and Processing Ports PPG, PP1, 
PP2, PP3 and PP4, respectively. However, to process 
in parallel data elements an, a2i, aai, a4i and asi the 
alignment network must perform in essence a shifting 
operation to direct the data elements an, a2i, a3i, a<n and 
ast to processors P0, PI, P2, P3 and P4, respectively. As 
can be seen, each data element in the set au, a2i, a3i» a4i 
and a 51 is shifted five Memory Modules (modulo 7) ^ 
from the preceding data element. The shift occurs mod- 
ulo 7 since there are seven memory modules (M0-M6). 

In general, the required shift would occur modulo N 
where N equals the number of memory modules. 

For illustrative purposes, a specific example of how 
the alignment network 11 of the present invention func- 
tions will be examined followed by a more general ap- 
proach to extend the application of the present inven- 
tion to more universal situations. With reference to 
FIG. 2, the alignment network 11 having seven (7) 35 
Memory ports MPG-MP6 and seven (7) Processing 
Ports PP0-PP6 is partitioned into a first level 13, a 
second level 15 and a third level 17. 

Each level 13, 15 and 17 includes seven (7) two-input 
selection gates 19, each having a first input 21, a second 4 0 
input 23, an output 25 and two selection control inputs 
EO and ES, see FIG. 3. For purposes of discussion, the 
control inputs of the selection gates 19 in the first level 
13 are designated EO" and ES" while the control inputs 
of the selection gates 19 in the second level 15 are desig- 45 
nated EO' and ES'. When a logical one or true level is 
present on the EO (or EO', EO") control input, a data 
communications path is provided between the first 
input 21 and the output 25. With a logical one or true 
level present on the ES (or ES', ES") control input, a 50 
data communications path is provided between the 
second input 23 and the output 25. All control inputs 
EO and ES are fashioned to receive complementary 
binary levels, so that a true or logical one level at EO 
implies a false or logical zero level at ES and vice-versa. 55 
The preferred embodiment fabrication of the simple 
two-input selection gate 19 will be detailed hereinafter. 

All selection gates 19 in a given level 13, 15 or 17 may 
have their control inputs EO and ES connected in par- 
allel. Thus, three bits of a control word m determines 60 
the data flow or shifting between the memory ports 
MP0-MP6 and the processing ports PP0-PP6. The 
most significant bit of m controls level 17, the second 
most significant bit controls level 15, and the least sig- 
nificant bit controls level 13. In essence the control 65 
word m provides control to the ES input of the selec- 
tion gates 19 while a binary complement of m feeds the 
EO input of the gates 19. 
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With continued reference to FIG. 2, it can be seen 
that a control word m of 000 would introduce no shift- 
ing and thus direct data flow would occur between 
memory ports MP0-MP6 and processing ports 
PP0-PP6 respectively. For a control word m of 100, a 
shift of 4 (modulo 7) would occur in level 17 with no 
shift in levels 13 and 15. Likewise, a control word m of 
010 would introduce a shift of 2 (modulo 7) in level 15 
and a control word m of 001 would introduce a shift 3 
(modulo 7) in level 13. Shifts may occur, of course, in 
more than one level. For example, a control word m of 
1 1 1 would generate a shift in all three levels 13, 15 and 
17. However, in practice, for the alignment network 11 
as shown in FIG. 2, control words 110 and 111 are not 
required since the same shift amount occurs using 000 
and 001 respectively. 

The selection gate 19 is readily fashioned from a first 
AND gate 27, a second AND gate 29 and an OR gate 
37, see FIG. 3. The AND gate 27 is fed by EO and by 
direct input 21. The AND gate 29 is fed by ES and shift 
input 23. The OR gate 37 is fed by both AND gates 27 
and 29 and provides output 25. In some logic families 
the OR gate 37 may be fabricated as a “wired-OR” 
rather than as an actual physical gate. 

The selection gate 19 fabrication as above described 
is unidirectional in that it provides data flow only from 
the memory ports MP0-MP6 to the processing ports 
PP0-PP6. Therefore, a reverse path must be provided 
to permit data to flow from the processing ports 
PP0-PP6 to the memory ports. Such reverse flow is 
easily provided for, see FIG. 4, by providing a first level 
13', a second level 15' and a third level 17'. 

Each level 13', 15' and 17' includes seven (7) two- 
input selection gates 19, each for transferring data back 
to the memory ports MP0-MP6 in the same manner in 
which the data was transferred to the processing ports 
PP0-PP6, see FIG. 2. By comparing FIG. 2 with FIG. 
4, one can see that under the control of a simple control 
word m, data can be pulled from the memory ports 
MP0-MP 6, sent to the desired processing ports 
PP0-PP6 and returned back to the memory ports 
MP0-MP6 from whence it came. Each level 13, 15, and 
17 of FIG. 2 corresponds to each level 13', 15' and 17' 
respectively of FIG. 4 in that the reversed data flow is 
channeled back to the memory ports MP0-MP6 in the 
same manner in which it is flowed to the processing 
ports PP0-PP6. 

The alignment network 11 above described for a 
system having seven (7) memory ports may be extended 
to the general case wherein the number of memory 
ports equals N. In the general case, the alignment net- 
work 11 includes a plurality of levels, each level includ- 
ing N number of two-input selection gates 19. The num- 
ber of levels is equal to log2 (N) rounded up to the 
nearest integer. In the above example, N equalled 7 and 
log2 (N) rounded up to the nearest integer equalled 3. 
The total number of gates 19 required in the general 
case is then N multiplied by log2 N rounded up to the 
nearest integer. 

Each level of the alignment network 11 either allows 
data to flow directly through or provides a data shift 
depending upon the control word m and more particu- 
larly upon the voltage levels applied to the ES and EO 
of each selection gate 19. The amount of shift in each 
level is equal to k 2 ^ L ~ ^ modulo N, where k is relatively 
prime to N and is a primitive root of N, N is the number 
of memory modules, and L is the alignment network 11 
level ranking. For example, referring to FIG. 2, 
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wherein k-3, the shift occuring in the first level 13 is 
32(1-0 modulo 7=3. In the second level 15, the shift is 
3^-1) modulo 7=2. The third level 17 shift is 32(3-0 
modulo 7=4. 

In operation, the distance d (which is the distance 5 
between elements sought to be accessed) is known and 
the value m must be generated. For example, in FIG, 1, 
to access the elements an, aj2, an, au and an the dis- 
tance d is unity and no shifting is required through the 
alignment network. Hence, it is clear in this case that m 10 
must equal zero. However, to access the data elements 
ai 1, a2i» aai, a4i and asi, the distance d is equal to five (5) 
and the control word m must be calculated to generate 
the proper shift through the alignment network 11. 

The calculation of m is derived from the relationship 15 
d=k m modulo N, see FIG. 5, which illustrates the gen- 
eration of m for the system of FIG. 2. In the preferred 
embodiment the value d is used to address a ROM 
which has been programmed to the equation d=k m 
modulo N to produce the value m at address d. FIG. 5 20 
illustrates the generation of m for values of d in a system 
having k=3 and N=7. Alternatively, of course, m 
could be generated by software given the values of d, k, 
and N. However, hardware generation of m is preferred 
since in parallel processors, speed is nearly always of 25 
the essence. 

The alignment network 11 shown in FIG. 2 was de- 
veloped for a system of seven (7) memory modules and 
a k of 3. Other arrangements may, of course, be devel- 
oped. For example, in a system having seventeen (17) 30 
memory modules, a k of 3, 5, 6, 7, 10, 11, 12 or 14 may 
be used. FIGS. 6A-6C as positioned as shown in FIG. 

6 illustrates in tabular format the generation of m for a 
system having k=3 and the number (N) of memory 
modules equal to 521. 35 

Other arrangements of the present invention may be 
fabricated. As an illustrative example, referring to FIG. 

2, levels may be combined in parallel rather than serial. 

If two levels were combined, for example level 13, and 
level 15, each selection gate 19 would require four in- 40 
puts instead of two to provide for the shift required in 
level 13, the shift of level 15, the combined shift of 
levels 13 and 15 and direct through data flow. Hence, 
the design trade-off is the complexity of gates 19 versus 
an increased number of gates 19 and an increased num- 45 
ber of levels. 

Parallelism may be carried even further by combining 
all levels 13, 15 and 17 and by using eight-input selec- 
tion gates 19. 

Further, in certain applications it may be desirable to 50 
insert data storing, shifting or processing apparatus 
between the alignment network of the present invention 
and the parallel memory modules storing the data to be 
aligned. For example, one such apparatus would be an 
Electronic Barrel Switch for Data Shifting of the type 55 
disclosed by R. A. Stokes et al in U. S. Pat. No. 
3,610,903. The disclosed Barrel Switch comprises a 
matrix of gates arranged in rectangular configuration 
and adapted to shift in a single clock time a multibit 
parallel input a preselected number of places to the left 60 
or right, either end-off or end-around. The Barrel 
Switch insertion permits d-ordered vectors stored in 
memory at various starting or base locations to be 
shifted to a left-most memory starting location for pro- 
cessing through the alignment network. 65 

Other obvious modifications are apparent. For exam- 
ple, with reference to FIGS. 2 and 4, it can be appreci- 
ated that the furthest left-most selection gate 19 in all 
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levels 13, 15 and 17 provides only direct through data 
flow regardless of control word m. Hence, in many 
applications, the left-most selection gate 19 may be 
deleted. 

With reference now to FIG. 7, the present invention 
may be implemented through the use of a first barrel 
switch 33 and a second barrel switch 35. As may be 
appreciated by a comparison to FIG. 2, and a review of 
the discussion relating thereto, the implementation of 
FIG. 7 performs an alignment between a plurality of 
memory ports MP0 through MP16 and a plurality of 
processor ports PPO through PP16, It is appreciated 
that alignment may be implemented for any number of 
memory and processor ports, such as for the seven (7) 
shown in FIG. 2 and FIG. 5 or for the five-hundred and 
twenty-one (521) detailed in FIGS. 6A-6C. 

The function of the first barrel switch 33 is merely to 
shift when required the stored d-vectors to the left-most 
starting position for processing through the barrel 
switch 35. The ordered output data paths 1 through 16 
of the first barrel switch 33 are connected to the or- 
dered input data paths 0 through 15 of the second barrel 
switch 35 in the sequence suggested in FIG. 8 for the 
equation x=k> > modulo N wherein x represents the or- 
dering of the output data paths of the first barrel switch 
33 and y represents the ordering of the input data paths 
of the second barrel switch 35. N is the number of mem- 
ory ports and k is a primitive root of N. For the embodi- 
ment shown, N equals 17 and k equals 3. The zero (0) 
ordered output data path of the first barrel switch 33 is 
connected directly to the zero (0) ordered processor 
port. 

The ordered output data paths 0 through 15 of the 
second barrel switch 35 are connected to the processor 
ports PP1 through PP16 in a sequence opposite to the 
sequence interconnecting the first barrel switch 33 and 
the second barrel switch 35. 

The control operations for the above-described FIG. 
7 implementation are relatively simple. First, an S con- 
trol input is provided to the first barrel switch 33 to shift 
the starting data element of a stored d-vector to the 
left-most (0) output data path of the first barrel switch 
33. Second, an m control input is provided to the second 
barrel switch 35 to produce the desired shift increment 
therein. The desired shift increment is equal to the dis- 
tance d (which is the distance between the data ele- 
ments sought to be accessed). 

The calculation of m is derived from the relationship 
d=k"> modulo N. See FIG. 8 which illustrates the cal- 
culation of m for the system of FIG. 7. In the preferred 
embodiment the value d is used to address a ROM 
which has been programmed to the equation d = k m 
modulo N to produce the value m at address d. FIG. 8 
illustrates the generation of m for values d in a system 
having k=3 and N=17. Alternatively, of course, m 
could be generated by software given the values of d, k 
and N. However, hardware generation of m is preferred 
since in parallel processors, speed is nearly always of 
the essence. 

Other arrangements of the present invention may be 
fabricated. For example, in applications wherein d vec- 
tors are stored having starting data elements all avail- 
able at memory port MP0, the first barrel switch 33 is 
not required and may be deleted. Also, the alignment 
network is not, of course, limited to being interposed 
between memory and processor ports but may be inter- 
posed between any set of parallel ports between which 
alignment is desired. 
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Thus, while particular embodiments of the present 
invention have been described and illustrated, it will be 
apparent to those skilled in the art that changes and 
modifications may be made therein without departing 
from the spirit and scope of the invention. 5 

What is claimed is: 

I. A parallel data access alignment network for align- 
ing data between N ordered input ports and N ordered 
output ports wherein N is an integer greater than one, 
said data comprising d-ordered vector data elements 10 
spaced d modulo N input ports apart, said network 
comprising: 

a first barrel switch having N ordered input data 
paths and N ordered output data paths, said N 
ordered input data paths thereof being connected IS 
in direct sequential order to the N ordered input 
ports, said first barrel switch providing a data path 
connection for a data element in said d- ordered 
vector data elements to the lowest ordered output 
data path of said first barrel switch in said N or- 20 
dered output data paths thereof; 

a second barrel switch having N— 1 ordered input 
data paths and N — 1 ordered output data paths and 
being responsive to a shift control signal for shift- 
ing data flow therebetween, said N— 1 ordered 25 
input data paths thereof being connected to the 
N — 1 highest ordered data output paths of said N 
ordered output data paths of said first barrel switch 
according to the relationship x=kr modulo N 
wherein x represents the output data path ordering 30 
of said first barrel switch, y represents the input 
data path ordering of said second barrel switch and 
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k represents a primitive root of N, and said N — I 
ordered output data paths of said second barrel 
switch being connected to the N— 1 highest or- 
dered output ports of said N ordered output ports 
in the same ordering sequence by which said N — 1 
ordered input data paths of said second barrel 
switch are connected to said N— 1 highest ordered 
output data paths of said first barrel switch; 
data path means for connecting the lowest ordered 
data path output of said N ordered data path out- 
puts of said first barrel switch to the lowest ordered 
output port of said N ordered output ports; and 
shift control means for generating said shift control 
signal and providing same to said second barrel 
switch, said shift control signal generated from the 
relationship d=k m modulo N wherein d is said 
d-ordered vector data element input port spacing, k 
is said primitive root of N, N is said integer greater 
than one, and m is said shift control signal specify- 
ing the amount of shift in said second barrel switch 
between said N— 1 ordered data output paths 
thereof and said N— l ordered data input paths 
thereof. 

2 . The parallel data access alignment network ac- 
cording to claim 1 wherein said shift control means 
includes: 

a memory addressed by said d and outputting said m 
in accord with said relationship d=k m modulo N. 

3. The parallel data access alignment network ac- 
cording to claim 2 within said memory is a read-only 
memory. 

***** 


35 


40 


45 


50 


55 


60 


65 



